Adam

2023-07-17 15:24| 来源: 网络整理| 查看: 265

Note

The foreach and fused implementations are typically faster than the for-loop, single-tensor implementation. Thus, if the user has not specified BOTH flags (i.e., when foreach = fused = None), we will attempt defaulting to the foreach implementation when the tensors are all on CUDA. For example, if the user specifies True for fused but nothing for foreach, we will run the fused implementation. If the user specifies False for foreach but nothing for fused (or False for fused but nothing for foreach), we will run the for-loop implementation. If the user specifies True for both foreach and fused, we will prioritize fused over foreach, as it is typically faster. We attempt to use the fastest, so the hierarchy goes fused -> foreach -> for-loop. HOWEVER, since the fused implementation is relatively new, we want to give it sufficient bake-in time, so we default to foreach and NOT fused when the user has not specified either flag.

【本文地址】

Adam

Adam

今日新闻

推荐新闻